230 ◾ Bioinformatics
and install R. You can also download and install RStudio by following the instructions
at “https://www.rstudio.com/products/rstudio/download/”. Once you have R installed on
your computer, run R, and on R prompt, run the following to install the Bioconductor
packages required for the remaining ChIP-Seq data analysis:
if (!require(“BiocManager”, quietly = TRUE))
install.packages(“BiocManager”)
BiocManager::install(“clusterProfiler”)
BiocManager::install(“ChIPseeker”)
BiocManager::install(“TxDb.Hsapiens.UCSC.hg19.knownGene”)
BiocManager::install(“EnsDb.Hsapiens.v75”)
BiocManager::install(“clusterProfiler”)
BiocManager::install(“AnnotationDbi”)
BiocManager::install(“org.Hs.eg.db”)
nstall.packages(“dplyr”)
After installing the packages, load them as follows:
library(clusterProfiler)
library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
library(EnsDb.Hsapiens.v75)
library(AnnotationDbi)
library(org.Hs.eg.db)
library(“dplyr”)
Now, we are ready to finish the remaining analysis in R. Copy “chip1_peaks.narrowPeak”,
“chip2_peaks.narrowPeak”, and “chip3_peaks.narrowPeak” files produced by MACS3
above to a directory that you can browse from inside R. Use R to choose your working
directory where those three files are copied. In the following, we can use ChIPseeker[9]
package functions to create different plots.
6.3.6.1 ChIP-Seq Peaks’ Coverage Plot
The “covplot()” function is used to create a plot that shows peak distribution over the
whole genome. This function calculates the coverage of peak regions over chromosomes
or regions of chromosomes and generates a peaks’ coverage plot. The function requires
the peak data as Granges (genomic ranges) object (see ChIPseeker documentations). The
following codes load each file as a data frame, add column names, create a Granges object,
and finally create the ChIP-Seq peaks’ coverage plots for each sample:
peaks1 <- read.table(“chip1_peaks.narrowPeak”,header=FALSE)
colnames <- c(“chrom”, “start”, “end”, “name”, “score”,”strand”,
“signal”, “pvalue”, “qvalue”, “peak”)
colnames(peaks1) <- colnames